PQ−Learning: An Efficient Robot Learning Method for Intelligent Behavior Acquisition
نویسندگان
چکیده
This paper presents an efficient reinforcement learning method, called the PQ-learning, for intelligent behavior acquisition by an autonomous robot. This method uses a special action value propagation technique, named the spatial propagation and temporal propagation, to achieve fast learning convergence in large state spaces. Compared with the approaches in literature, the proposed method offers three benefits for robot learning. First, this is a general method, which should be applicable to most reinforcement learning tasks. Second, the learning is guaranteed to converge to the optimum with a much faster converging speed than the traditional Q and Q(λ)-learning methods. Third, it supports both self and teacher-directed learning, where the help from the teacher is directing the robot to explore, instead of explicitly offering labels or ground truths as in the supervised-learning regime. The proposed method had been tested with a simulated robot navigation-learning problem. The results show that this method significantly outperforms the Q(λ)-learning algorithm in terms of the learning speeds in both self and teacher-directed learning regimes.
منابع مشابه
An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network
RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...
متن کاملA New Intelligent Approach to Patient-cooperative Control of Rehabilitation Robots
This paper presents a new intelligent method to control rehabilitation robots by mainly considering reactions of patient instead of doing a repetitive preprogrammed movement. It generates a general reference trajectory based on different reactions of patient during therapy. Three main reactions has been identified and included in reference trajectory: small variations, force shocks in a single ...
متن کاملUser-guided reinforcement learning of robot assistive tasks for an intelligent environment
Autonomous robots hold the possibility of performing a variety of assistive tasks in intelligent environments. However, widespread use of robot assistants in these environments requires ease of use by individuals who are generally not skilled robot operators. In this paper we present a method of training robots that bridges the gap between user programming of a robot and autonomous learning of ...
متن کاملVisual Tracking using Learning Histogram of Oriented Gradients by SVM on Mobile Robot
The intelligence of a mobile robot is highly dependent on its vision. The main objective of an intelligent mobile robot is in its ability to the online image processing, object detection, and especially visual tracking which is a complex task in stochastic environments. Tracking algorithms suffer from sequence challenges such as illumination variation, occlusion, and background clutter, so an a...
متن کاملUsing BELBIC based optimal controller for omni-directional threewheel robots model identified by LOLIMOT
In this paper, an intelligent controller is applied to control omni-directional robots motion. First, the dynamics of the three wheel robots, as a nonlinear plant with considerable uncertainties, is identified using an efficient algorithm of training, named LoLiMoT. Then, an intelligent controller based on brain emotional learning algorithm is applied to the identified model. This emotional l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001